Unsupervised Detection of Violent Content in Arabic Social Media
نویسندگان
چکیده
A monitoring system is proposed to detect violent content in Arabic social media. This is a new and challenging task due to the presence of various Arabic dialects in the social media and the non-violent context where violent words might be used. We proposed to use a probabilistic nonlinear dimensionality reduction technique called sparse Gaussian process latent variable model (SGPLVM) followed by k-means to separate violent from non-violent content. This framework does not require any labelled corpora for training. We show that violent and non-violent Arabic tweets are not separable using k-means in the original high dimensional space, however better results are achieved by clustering in low dimensional latent space of SGPLVM.
منابع مشابه
Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملUnderstanding and Discovering Deliberate Self-harm Content in Social Media
Studies suggest that self-harm users found it easier to discuss self-harm-related thoughts and behaviors using social media than in the physical world. Given the enormous and increasing volume of social media data, on-line self-harm content is likely to be buried rapidly by other normal content. To enable voices of self-harm users to be heard, it is important to distinguish self-harm content fr...
متن کاملTowards a Corpus of Violence Acts in Arabic Social Media
In this paper we present a new corpus of Arabic tweets that mention some form of violent event, developed to support the automatic identification of human rights abuses and different violent acts. The dataset was manually labelled for seven classes of violence using crowdsourcing. Only tweets classified with a high degree of agreement were included in the final dataset.
متن کاملTelevision and Video Game Violence: Age Differences and the Combined Effects of Passive and Interactive Violent Media
The present research examined the combined effects of violent video games and violent TV programs on third and sixth-grade boys’ thoughts and behavior. In individual sessions, demographic information about the children’s television viewing and video game playing habits was collected. Participants were exposed to one of six following media conditions for 15 minutes; a) watch a violent (wrestling...
متن کاملUser sentiment detection: a YouTube use case
In this paper we propose an unsupervised lexicon-based approach to detect the sentiment polarity of user comments in YouTube. Polarity detection in social media content is challenging not only because of the existing limitations in current sentiment dictionaries but also due to the informal linguistic styles used by users. Present dictionaries fail to capture the sentiments of community-created...
متن کامل